ABSTRACT
When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardised. Such a new discipline is text mining. In a groundbreaking paper, <i>Untangling text data mining</i>, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorisation of data- and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge.
- ALBRECHT, R. AND MERKL, D. 1998. Knowledge discovery in literature data bases. In Library and information services in astronomy III. (ASP conference series, vol. 153.) http://www.stsci.edu/stsci/meetings/lisa3/albrechtrl.html.]]Google Scholar
- BERSON, A. AND SMITH, S.J. 1997. Data warehousing, data mining, and OLAP. McGraw-Hill, New York, NY.]] Google ScholarDigital Library
- BIGGS, M. 2000. Resurgent text-mining technology can greatly increase your firm's 'intelligence' factor. InfoWorld 11(2), 52.]]Google Scholar
- CHEN, H. 2001. Knowledge management systems: a text mining perspective. University of Arizona (Knowledge Computing Corporation), Tucson, Arizona.]]Google Scholar
- CORNFORD, T. AND SMITHSON, S. 1996. Project research in information systems: a student's guide. Macmillan, Houndmills. (Information system series.)]]Google Scholar
- HALLIMAN, C. 2001. Business intelligence using smart techniques: environmental scanning using text mining and competitor analysis using scenarios and manual simulation. Information Uncover, Houston, TA.]]Google Scholar
- HAN, J. AND KAMBER, M. 2001. Data mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.]] Google ScholarDigital Library
- HEARST, M.A. 1999. Untangling text data mining. In Proceedings of ACL'99: the 37th annual meeting of the association for computational linguistics, University of Maryland, June 20-26 (invited paper). http://www.ai.mit.edu/people/jimmylin/papers/Hearst99a.pdf.]] Google ScholarDigital Library
- HOVY, E. AND LIN, C.Y. 1999. Automated text summarization in SUMMARIST. In Advances in automated text summarization. I. MANI AND M.T. MAYBURY, Eds. MIT Press, MA, 81-94. http://www.isi.edu/~cyl/.]] Google ScholarDigital Library
- KONTOS, J., MALAGARDI, I., ALEXANDRIS, C. AND BOULIGARAKI, M. 2000. Greek verb semantic processing for stock market text mining. In Proceedings of natural language processing: 2nd international conference, Patras, Greece, June 2000, D.N. CHRISTODOULAKIS, Ed. Springer, Berlin, 395-405. (Lecture notes in artificial intelligence, no. 1835.)]] Google ScholarDigital Library
- LUCAS, M. 1999/2000. Mining in textual mountains, an interview with Marti Hearst. Mappa Mundi Magazine, Trip-M, 005, 1-3. http://mappa.mundi.net/trip-m/hearst/.]]Google Scholar
- MACK, R. AND HEHENBERGER, M. 2002. Text-based knowledge discovery: search and mining of life-science documents. Drug discovery today 7(11) (Suppl.), S89-S98.]]Google Scholar
- NASUKAWA, T. AND NAGANO, T. 2001. Text analysis and knowledge mining system. IBM Systems journal 40(4), 967-984.]] Google ScholarDigital Library
- NEW ZEALAND DIGITAL LIBRARY, UNIVERSITY OF WAIKATO. 2002. Text mining. http://www.cs.waikato.ac.nz/~nzdl/textmining/.]]Google Scholar
- PERRIN, P. AND PETRY, F.E. 2003. Extraction and representation of contextual information for knowledge discovery in texts. Information sciences 151, 125-152.]] Google ScholarDigital Library
- PONELIS, S. AND FAIRER-WESSELS, F.A. 1998. Knowledge management: a literature overview. South African journal of library and information science 66(1), 1-9.]]Google Scholar
- RAJMAN, M. AND BESANÇON, R. 1998. Text mining: natural language techniques and text mining applications. In Data mining and reverse engineering: searching for semantics, S. SPACCAPIETRA AND F. MARYANSKI, Eds. Chapmann and Hall, London, 50-64.]]Google Scholar
- ROB, P. AND CORONEL, C. 2002. Database systems: design, implementation, and management, 5th ed. Course Technology, Boston, MA.]] Google ScholarDigital Library
- STAIR, R.M. AND REYNOLDS, G.W. 2001. Principles of information systems: a managerial approach, 5th ed. Course Technology, Boston, MA.]] Google ScholarDigital Library
- SULLIVAN, D. 2000. The need for text mining in business intelligence. DM Review, Dec. 2000. http://www.dmreview.com/master.cfm.]]Google Scholar
- SULLIVAN, D. 2001. Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley, New York, NY.]] Google ScholarDigital Library
- THURAISINGHAM, B. 1999. Data mining: technologies, techniques, tools, and trends. CRC Press, Boca Raton, Florida.]] Google ScholarDigital Library
- WESTPHAL, C.R. AND BLAXTON, T. 1998. Data mining solutions: methods and tools for solving real-world problems. Wiley, New York, NY.]] Google ScholarDigital Library
- ZORN, P., EMANOIL, M., MARSHALL, L. AND PANEK, M. 1999. Mining meets the web. Online 23(5), 17-28.]]Google Scholar
Index Terms
- Differentiating data- and text-mining terminology
Recommendations
Knowledge Discovery in Text Mining Technique Using Association Rules Extraction
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication NetworksThis paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association ...
Mining Text Using Keyword Distributions
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work ...
Generating association graphs of non-cooccurring text objects using transitive methods
SAC '05: Proceedings of the 2005 ACM symposium on Applied computingIn this paper we discuss text data mining (TDM) mainly in the context of the biomedical domain, where we extract associations from MEDLINE text articles and construct association graphs. We explore two techniques, the co-occurrence method and transitive ...
Comments